Goto

Collaborating Authors

 virtual cell


Large Language Models Meet Virtual Cell: A Survey

arXiv.org Artificial Intelligence

Large language models (LLMs) are transforming cellular biology by enabling the development of "virtual cells"--computational systems that represent, predict, and reason about cellular states and behaviors. This work provides a comprehensive review of LLMs for virtual cell modeling. We propose a unified taxonomy that organizes existing methods into two paradigms: LLMs as Oracles, for direct cellular modeling, and LLMs as Agents, for orchestrating complex scientific tasks. We identify three core tasks--cellular representation, perturbation prediction, and gene regulation inference--and review their associated models, datasets, evaluation benchmarks, as well as the critical challenges in scalability, generalizability, and interpretability.


SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics

arXiv.org Artificial Intelligence

Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct \textbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data through capturing and integrating multi-scale information.


Virtual Cells: Predict, Explain, Discover

arXiv.org Artificial Intelligence

Drug discovery is fundamentally a process of inferring the effects of treatments on patients, and would therefore benefit immensely from computational models that can reliably simulate patient responses, enabling researchers to generate and test large numbers of therapeutic hypotheses safely and economically before initiating costly clinical trials. Even a more specific model that predicts the functional response of cells to a wide range of perturbations would be tremendously valuable for discovering safe and effective treatments that successfully translate to the clinic. Creating such virtual cells has long been a goal of the computational research community that unfortunately remains unachieved given the daunting complexity and scale of cellular biology. Nevertheless, recent advances in AI, computing power, lab automation, and high-throughput cellular profiling provide new opportunities for reaching this goal. In this perspective, we present a vision for developing and evaluating virtual cells that builds on our experience at Recursion. We argue that in order to be a useful tool to discover novel biology, virtual cells must accurately predict the functional response of a cell to perturbations and explain how the predicted response is a consequence of modifications to key biomolecular interactions. We then introduce key principles for designing therapeutically-relevant virtual cells, describe a lab-in-the-loop approach for generating novel insights with them, and advocate for biologically-grounded benchmarks to guide virtual cell development. Finally, we make the case that our approach to virtual cells provides a useful framework for building other models at higher levels of organization, including virtual patients. We hope that these directions prove useful to the research community in developing virtual models optimized for positive impact on drug discovery outcomes.


A Virtual Cell Is a 'Holy Grail' of Science. It's Getting Closer.

The Atlantic - Technology

The human cell is a miserable thing to study. Tens of trillions of them exist in the body, forming an enormous and intricate network that governs every disease and metabolic process. Each cell in that circuit is itself the product of an equally dense and complex interplay among genes, proteins, and other bits of profoundly small biological machinery. Our understanding of this world is hazy and constantly in flux. As recently as a few years ago, scientists thought there were only a few hundred distinct cell types, but new technologies have revealed thousands (and that's just the start).


How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

arXiv.org Artificial Intelligence

The cell is arguably the smallest unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of AI-powered Virtual Cells, where robust representations of cells and cellular systems under different conditions are directly learned from growing biological data across measurements and scales. We discuss desired capabilities of AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions is within reach.


E(n) Equivariant Topological Neural Networks

arXiv.org Artificial Intelligence

Graph neural networks excel at modeling pairwise interactions, but they cannot flexibly accommodate higher-order interactions and features. Topological deep learning (TDL) has emerged recently as a promising tool for addressing this issue. TDL enables the principled modeling of arbitrary multi-way, hierarchical higher-order interactions by operating on combinatorial topological spaces, such as simplicial or cell complexes, instead of graphs. However, little is known about how to leverage geometric features such as positions and velocities for TDL. This paper introduces E(n)-Equivariant Topological Neural Networks (ETNNs), which are E(n)-equivariant message-passing networks operating on combinatorial complexes, formal objects unifying graphs, hypergraphs, simplicial, path, and cell complexes. ETNNs incorporate geometric node features while respecting rotation and translation equivariance. Moreover, ETNNs are natively ready for settings with heterogeneous interactions. We provide a theoretical analysis to show the improved expressiveness of ETNNs over architectures for geometric graphs. We also show how several E(n) equivariant variants of TDL models can be directly derived from our framework. The broad applicability of ETNNs is demonstrated through two tasks of vastly different nature: i) molecular property prediction on the QM9 benchmark and ii) land-use regression for hyper-local estimation of air pollution with multi-resolution irregular geospatial data. The experiment results indicate that ETNNs are an effective tool for learning from diverse types of richly structured data, highlighting the benefits of principled geometric inductive bias.


How a Yeast Cell Helps Crack Open the "Black Box" Behind Artificial Intelligence

#artificialintelligence

"It seems like every time you turn around, someone is talking about the importance of artificial intelligence and machine learning," said Trey Ideker, PhD, University of California San Diego School of Medicine and Moores Cancer Center professor. "But all of these systems are so-called'black boxes.' They can be very predictive, but we don't actually know all that much about how they work." Ideker gives an example: machine learning systems can analyze the online behaviors of millions of people to flag an individual as a potential "terrorist" or "suicide risk." "Yet we have no idea how the machine reached that conclusion," he said.


Virtual Cell Can Simulate Cellular Growth Using Machine Learning

#artificialintelligence

Scientists have created a virtual yeast cell model that can learn from real-world behaviors, a key step in utilizing artificial intelligence in healthcare to diagnose diseases. A team of researchers from the University of California San Diego has developed what they called a "visible" neural network that enabled them to build DCell--a machine learning model of a functioning brewer's yeast cell that is commonly used in basic research. Machine learning systems are built on a neural network that consist of layers of artificial neurons that are tied together by seemingly random connections between neurons. The systems "learn" by fine-tuning those connections. In DCell, the researchers amassed all knowledge of cell biology in one place and created a hierarchy of the cellular components.